NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adaptively profiling models with task elicitation

https://doi.org/10.18653/v1/2025.emnlp-main.1270

Brown, Davis; Balehannina, Prithvi; Jin, Helen; Havaldar, Shreya; Hassani, Hamed; Wong, Eric (November 2025, Association for Computational Linguistics)

Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks—an order of magnitude more than prior work—where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context.
more » « less
Full Text Available
Probabilistic Soundness Guarantees in LLM Reasoning Chains

https://doi.org/10.18653/v1/2025.emnlp-main.382

You, Weiqiu; Xue, Anton; Havaldar, Shreya; Rao, Delip; Jin, Helen; Callison-Burch, Chris; Wong, Eric (November 2025, Association for Computational Linguistics)

In reasoning chains generated by large language models (LLMs), initial errors often propagate and undermine the reliability of the final conclusion. Current LLM-based error detection methods often fail to detect propagated errors because earlier errors can corrupt judgments of downstream reasoning. To better detect such errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a probabilistic framework that evaluates each reasoning step based solely on previously-verified premises. This inductive method yields a nuanced score for each step and provides certified statistical guarantees of its soundness, rather than a brittle binary label. ARES achieves state-of-the-art performance across four benchmarks (72.1% Macro-F1, +8.2 points) and demonstrates superior robustness on very long synthetic reasoning chains, where it excels at detecting propagated errors (90.3% F1, +27.6 points).
more » « less
Full Text Available
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups

You, Weiqiu; Xue, Anton; Havaldar, Shreya; Rao, Delip; Jin, Helen; Callison-Burch, Chris; Wong, Eric (July 2025, PMLR)

Self-attributing neural networks (SANNs) present a potential path towards interpretable models for high-dimensional problems, but often face significant trade-offs in performance. In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. Motivated by these insights, we propose Sum-of-Parts (SOP), a framework that transforms any differentiable model into a group-based SANN, where feature groups are learned end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and we validate that the groups are interpretable on a range of quantitative and semantic metrics. We further validate the utility of SOP explanations in model debugging and cosmological scientific discovery.
more » « less
Full Text Available
The FIX Benchmark: Extracting Features Interpretable to eXperts

Jin, Helen; Havaldar, Shreya; Kim, Chaehyeon; Xue, Anton; You, Weiqiu; Qu, Helen; Gatti, Marco; Hashimoto, Daniel A; Jain, Bhuvnesh; Madani, Amin; et al (June 2025, Journal of Data-centric Machine Learning Research)

Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.
more » « less
Full Text Available
Automatically Generated Summaries of Video Lectures May Enhance Students’ Learning Experience

https://doi.org/10.18653/v1/2023.bea-1.31

Gonzalez, Hannah; Li, Jiening; Jin, Helen; Ren, Jiaxuan; Zhang, Hongyu; Akinyele, Ayotomiwa; Wang, Adrian; Miltsakaki, Eleni; Baker, Ryan; Callison-Burch, Chris (July 2023, Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023))

We introduce a novel technique for automatically summarizing lecture videos using large language models such as GPT-3 and we present a user study investigating the effects on the studying experience when automatic summaries are added to lecture videos. We test students under different conditions and find that the students who are shown a summary next to a lecture video perform better on quizzes designed to test the course materials than the students who have access only to the video or the summary. Our findings suggest that adding automatic summaries to lecture videos enhances the learning experience. Qualitatively, students preferred summaries when studying under time constraints.
more » « less
Full Text Available
Large‐scale, image‐based tree species mapping in a tropical forest using artificial perceptual learning

https://doi.org/10.1111/2041-210X.13549

Tang, Chengliang; Uriarte, María; Jin, Helen; Morton, Douglas; Zheng, Tian (April 2021, Methods in Ecology and Evolution)
Ellison, Aaron (Ed.)
Full Text Available

Search for: All records